Neural networks and systems for decoding encoded data

ABSTRACT

Examples described herein utilize multi-layer neural networks to decode encoded data (e.g., data encoded using one or more encoding techniques). The neural networks may have nonlinear mapping and distributed processing capabilities which may be advantageous in many systems employing the neural network decoders. In this manner, neural networks described herein may be used to implement error code correction (ECC) decoders.

CROSS REFERENCE TO RELATED APPLICATIONS

This application is a continuation of pending U.S. patent applicationSer. No. 16/233,576 filed Dec. 27, 2018. The aforementioned applicationis incorporated herein by reference, in its entirety, for any purpose.

TECHNICAL FIELD

Examples described herein relate to neural networks for use in decodingencoded data. Examples of neural networks are described which may beused with error-correcting coding (ECC) memory, where a neural networkmay be used to decode encoded data.

BACKGROUND

Error correction coding (ECC) may be used in a variety of applications,such as memory devices or wireless baseband circuitry. Generally, errorcorrection coding techniques may encode original data with additionalbits to describe the original bits which are intended to be stored,retrieved, and/or transmitted. The additional bits may be storedtogether with the original bits. Accordingly, there may be L bits oforiginal data to be stored and/or transmitted. An encoder may provideN-L additional bits, such that the encoded data may be N bits worth ofdata. The original bits may be stored as the original bits, or may bechanged by the encoder to form the encoded N bits of stored data. Adecoder may decode the N bits to retrieve and/or estimate the original Lbits, which may be corrected in some examples in accordance with the ECCtechnique.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic illustration of a neural network arranged inaccordance with examples described herein.

FIG. 2 is a schematic illustration of a hardware implementation of aneural network arranged in accordance with examples described herein.

FIG. 3 is a schematic illustration of an apparatus arranged inaccordance with examples described herein.

FIG. 4 . is a flowchart of a method arranged in accordance with examplesdescribed herein.

DETAILED DESCRIPTION

Multi-layer neural networks may be used to decode encoded data (e.g.,data encoded using one or more encoding techniques). The neural networksmay have nonlinear mapping and distributed processing capabilities whichmay be advantageous in many systems employing the neural networkdecoders. In this manner, neural networks described herein may be usedto implement error code correction (ECC) decoders.

An encoder may have L bits of input data (a1, a2, . . . aL). The encodermay encode the input data in accordance with an encoding technique toprovide N bits of encoded data (b1, b2, . . . bN). The encoded data maybe stored and/or transmitted, or some other action taken with theencoded data, which may introduce noise into the data. Accordingly, adecoder may receive a version of the N bits of encoded data (x1, x2, . .. xN). The decoder may decode the received encoded data into an estimateof the L bits original data (y1, y2, . . . yL).

Examples of wireless baseband circuitry may utilize error correctioncoding (such as low density parity check coding, LDPC). An encoder mayadd particularly selected N-L bits into an original data of L bits,which may allow a decoder to decode the data and reduce and/or minimizeerrors introduced by noise, interferences and other practical factors inthe data storage and transmission.

There are a variety of particular error correction coding techniques,including low density parity check coding (LDPC), Reed-Solomon coding,Bose-Chaudhuri-Hocquenghem (BCH), and Polar coding. The use of thesecoding techniques, however, may come at the cost of the decrease of thefrequency and/or channel and/or storage resource usage efficiency andthe increase of the processing complexity. For example, the use ofcoding techniques may increase the amount of data which may be storedand/or transmitted. Moreover, processing resources may be necessary toimplement the encoding and decoding. In some examples, the decoder maybe one of the processing blocks that cost the most computationalresources in wireless baseband circuitry and/or memory controllers,which may reduce the desirability of existing decoding schemes in manyemerging applications such as Internet of Things (IoT) and/or tactileinternet where ultra-low power consumption and ultra-low latency arehighly desirable.

Examples described herein utilize multi-layer neural networks to decodeencoded data (e.g., data encoded using one or more encoding techniques).The neural networks have nonlinear mapping and distributed processingcapabilities which may be advantageous in many systems employing theneural network decoders.

FIG. 1 is a schematic illustration of a neural network arranged inaccordance with examples described herein. The neural network 100include three stages (e.g., layers). While three stages are shown inFIG. 1 , any number of stages may be used in other examples. A firststage of neural network 100 includes node 118, node 120, node 122, andnode 124. A second stage of neural network 100 includes combiner 102,combiner 104, combiner 106, and combiner 108. A third stage of neuralnetwork 100 includes combiner 110, combiner 112, combiner 114, andcombiner 116. Additional, fewer, and/or different components may be usedin other examples.

Generally, a neural network may be used including multiple stages ofnodes. The nodes may be implemented using processing elements which mayexecute one or more functions on inputs received from a previous stageand provide the output of the functions to the next stage of the neuralnetwork. The processing elements may be implemented using, for example,one or more processors, controllers, and/or custom circuitry, such as anapplication specific integrated circuit (ASIC) and/or a fieldprogrammable gate array (FPGA). The processing elements may beimplemented as combiners and/or summers and/or any other structure forperforming functions allocated to the processing element. In someexamples, certain of the processing elements of neural networksdescribed herein perform weighted sums, e.g., may be implemented usingone or more multiplication/accumulation units, which may be implementedusing processor(s) and/or other circuitry.

In the example, of FIG. 1 , the neural network 100 may have an inputlayer, which may be a first stage of the neural network including node118, node 120, node 122, and node 124. The nodes node 118, node 120,node 122, and node 124 may implement a linear function which may providethe input signals (e.g., x1(n), x2(n), . . . xN(n)) to another stage ofthe neural network (e.g., a ‘hidden stage’ or ‘hidden layer’).Accordingly, in the example of FIG. 1 , N bits of input data may beprovided to an input stage (e.g., an input layer) of a neural networkduring operation. In some examples, the input data may be data encodedin accordance with an encoding technique (e.g., low density parity checkcoding (LDPC), Reed-Solomon coding, Bose-Chaudhuri-Hocquenghem (BCH),and/or Polar coding). The N bits of input data may be output by thefirst stage of the neural network 100 to a next stage of the neuralnetwork 100. In some examples, the connection between the first stageand the second stage of the neural network 100 may not be weighted—e.g.,processing elements in the second stage may receive bits unaltered fromthe first stage in some examples. Each of the input data bits may beprovided to multiple ones of the processing elements in the next stage.While an input layer is shown, in some examples, the input layer may notbe present.

The neural network 100 may have a next layer, which may be referred toas a ‘hidden layer’ in some examples. The next layer may includecombiner 102, combiner 104, combiner 106, and combiner 108, although anynumber of elements may be used. While the processing elements in thesecond stage of the neural network 100 are referred to as combiners,generally the processing elements in the second stage may perform anonlinear activation function using the input data bits received at theprocessing element. Any number of nonlinear activation functions may beused. Examples of functions which may be used include Gaussianfunctions, such as

${{f(r)} = {\exp\left( {- \frac{r^{2}}{\sigma^{2}}} \right)}}.$

Examples of functions which may be used include multi-quadraticfunctions, such as ƒ(r)=(r²+σ²)^(1/2). Examples of functions which maybe used include inverse multi-quadratic functions, such asƒ(r)=(r²+σ²)^(−1/2). Examples of functions which may be used includethin-plate-spline functions, such as ƒ(r)=r² log (r). Examples offunctions which may be used include piece-wise linear functions, such as

${f(r)} = {\frac{1}{2}{\left( {{❘{r + 1}❘} - {❘{r - 1}❘}} \right).}}$

Examples of functions which may be used include cubic approximationfunctions, such as

${f(r)} = {\frac{1}{2}{\left( {{❘{r^{3} + 1}❘} - {❘{r^{3} + 1}}} \right).}}$

In these example functions, σ represents a real parameter (e.g., ascaling parameter) and r is the distance between the input vector andthe current vector. The distance may be measured using any of a varietyof metrics, including the Euclidean norm.

Each element in the ‘hidden layer’ may receive as inputs selected bits(e.g., some or all) of the input data. For example, each element in the‘hidden layer’ may receive as inputs from the output of multipleselected elements (e.g., some or all elements) in the input layer. Forexample, the combiner 102 may receive as inputs the output of node 118,node 120, node 122, and node 124. While a single ‘hidden layer’ is shownby way of example in FIG. 1 , any number of ‘hidden layers’ may bepresent and may be connected in series. While four elements are shown inthe ‘hidden layer’, any number may be used, and they may be the same ordifferent in number than the number of nodes in the input layer and/orthe number of nodes in any other hidden layer. The nodes in the hiddenlayer may evaluate at least one non-linear function using combinationsof the data received at the hidden layer node (e.g., element). In thismanner, the hidden layer may provide intermediate data at an output ofone or more hidden layers.

The neural network 100 may have an output layer. The output layer in theexample of FIG. 1 may include combiner 110, combiner 112, combiner 114,and combiner 116, although any number of elements may be used. While theprocessing element in the output stage of the neural network 100 arereferred to as combiners, generally the processing elements in theoutput may perform any combination or other operation using data bitsreceived from a last ‘hidden layer’ in the neural network. Each elementin the output layer may receive as inputs selected bits (e.g., some orall) of the data provided by a last ‘hidden layer’. For example, thecombiner 110 may receive as inputs from the outputs of combiner 102,combiner 104, combiner 106, and combiner 108. The connections betweenthe hidden layer and the output layer may be weighted. For example, aset of weights W may be specified. There may be one weight for eachconnection between a hidden layer node and an output layer node in someexamples. In some examples, there may be one weight for each hiddenlayer node that may be applied to the data provided by that node to eachconnected output node. Other distributions of weights may also be used.The weights may be multiplied with the output of the hidden layer beforethe output is provided to the output layer. In this manner, the outputlayer may perform a sum of weighted inputs. Accordingly, an output ofthe neural network 100 (e.g., the outputs of the output layer) may bereferred to as a weighted sum. The output layer may accordingly combineintermediate data received from one or more hidden layers using weightsto provide output data.

In some examples, the neural network 100 may be used to provide L outputbits which represent decoded data corresponding to N input bits. Forexample, in the example of FIG. 1 , N input bits are shown (x₁ (n),x₂(n), . . . x_(N)(n)) and L output bits are provided (y₁(n), y₂(n), . .. y_(L)(n)). The neural network 100 may be trained such that the weightsW used and/or the functions provided by the elements of the hiddenlayers cause the neural network 100 to provide output bits whichrepresent the decoded data corresponding to the N encoded input bits.The input bits may have been encoded with an encoding technique, and theweights and/or functions provided by the elements of the hidden layersmay be selected in accordance with the encoding technique. Accordingly,the neural network 100 may be trained multiple times—once for eachencoding technique that may be used to provide the neural network 100with input data.

Examples of neural networks may be trained. Training generally refers tothe process of determining weights, functions, and/or other attributesto be utilized by a neural network to create a desired transformation ofinput data to output data. In some examples, neural networks describedherein may be trained to transform encoded input data to decoded data(e.g., an estimate of the decoded data). In some examples, neuralnetworks described herein may be trained to transform noisy encodedinput data to decoded data (e.g., an estimate of the decoded data). Inthis manner, neural networks may be used to reduce and/or improve errorswhich may be introduced by noise present in the input data. In someexamples, neural networks described herein may be trained to transformnoisy encoded input data to encoded data with reduced noise. The encodeddata with reduced noise may then be provided to any decoder (e.g., aneural network and/or other decoder) for decoding of the encoded data.In this manner, neural networks may be used to reduce and/or improveerrors which may be introduced by noise.

Training as described herein may be supervised or un-supervised invarious examples. In some examples, training may occur using known pairsof anticipated input and desired output data. For example, training mayutilize known encoded data and decoded data pairs to train a neuralnetwork to decode subsequent encoded data into decoded data. In someexamples, training may utilize known noisy encoded data and decoded datapairs to train a neural network to decode subsequent noisy encoded datainto decoded data. In some examples, training may utilize known noisyencoded data and encoded data pairs to train a neural network to provideencoded data having reduced noise than input noisy encoded data.Examples of training may include determining weights to be used by aneural network, such as neural network 100 of FIG. 1 . In some examples,the same neural network hardware is used during training as will be usedduring operation. In some examples, however, different neural networkhardware may be used during training, and the weights, functions, orother attributes determined during training may be stored for use byother neural network hardware during operation.

Examples of training can be described mathematically. For example,consider input data at a time instant (n), given as:

X(n)=[x ₁(n),x ₂(n), . . . x _(N)(n)]^(Y)

the center vector for each element in hidden layer(s) of the neuralnetwork (e.g., combiner 102, combiner 104, combiner 106, and combiner108 of FIG. 1 ) may be denoted as C_(i) (for i=1, 2, . . . , H, where His the element number in the hidden layer).

The output of each element in a hidden layer may then be given as:

h _(i)(n)=f _(i)(∥X(n)−C _(i)∥) for(i=1,2, . . . , H)  (1)

The connections between a last hidden layer and the output layer may beweighted. Each element in the output layer may have a linearinput-output relationship such that it may perform a summation (e.g., aweighted summation). Accordingly, an output of the i'th element in theoutput layer at time n may be written as:

y _(i)(n)=Σ_(j=1) ^(H) W _(ij) h _(j)(n)=Σ_(j=1) ^(H) W _(ij) f_(j)(∥X(n)−C _(j)∥)  (2)

for (i=1, 2, . . . , L) and where L is the element number of the outputof the output layer and W_(ij) is the connection weight between the j'thelement in the hidden layer and the i'th element in the output layer.

Generally, a neural network architecture (e.g., the neural network 100of FIG. 1 ) may include a number of elements and may have center vectorswhich are distributed in the input domain such that the neural networkmay approximate nonlinear multidimensional functions and therefore mayapproximate forward mapping an inverse mapping between two code words(e.g., from an N-bit input to an L-bit output). Generally, the choice oftransfer function used by elements in the hidden layer may not affectthe mapping performance of the neural network, and accordingly, afunction may be used which may be implemented conveniently in hardwarein some examples. For example, a thin-plate-spline function and/or aGaussian function may be used in various examples and may both provideadequate approximation capabilities. Other functions may also be used.

Examples of neural networks may accordingly be specified by attributes(e.g., parameters). In some examples, two sets of parameters may be usedto specify a neural network: connection weights and center vectors(e.g., thresholds). The parameters may be determined from selected inputdata (e.g., encoded input data) by solving an optimization function. Anexample optimization function may be given as:

E=Σ _(n=1) ^(M) ∥Y(n)−

∥²  (3)

where M is a number of trained input vector (e.g., trained encoded datainputs) and Y(n) is an output vector computed from the sample inputvector using Equations 1 and 2 above, and

is the corresponding desired (e.g., known) output vector. The outputvector Y(n) may be written as:

Y(n)=[y ₁(n),y ₂(n), . . . y _(N)(n)]^(T)

Various methods (e.g., gradient descent procedures) may be used to solvethe optimization function. However, in some examples, another approachmay be used to determine the parameters of a neural network, which maygenerally include two steps—(1) determining center vectors C_(i) (i=1,2, . . . , H) and (2) determining the weights.

In some examples, the center vectors may be chosen from a subset ofavailable sample vectors. In such examples, the number of elements inthe hidden layer(s) may be relatively large to cover the entire inputdomain. Accordingly, in some examples, it may be desirable to applyk-means cluster algorithms. Generally, k-means cluster algorithmsdistribute the center vectors according to the natural measure of theattractor (e.g., if the density of the data points is high, so is thedensity of the centers). k-means cluster algorithms may find a set ofcluster centers and partition the training samples into subsets. Eachcluster center may be associated with one of the H hidden layer elementsin this network. The data may be partitioned in such a way that thetraining points are assigned to the cluster with the nearest center. Thecluster center corresponding to one of the minima of an optimizationfunction. An example optimization function for use with a k-meanscluster algorithm may be given as:

E _(k_means)=Σ_(j=1) ^(H)Σ_(n=1) ^(M) B _(jn) ∥X(n)−C _(j)∥²  (4)

where B_(jn) is the cluster partition or membership function forming anH×M matrix. Each column may represent an available sample vector (e.g.,known input data) and each row may represent a cluster. Each column mayinclude a single ‘1’ in the row corresponding to the cluster nearest tothat training point, and zeros elsewhere.

The center of each cluster may be initialized to a different randomlychosen training point. Then each training example may be assigned to theelement nearest to it. When all training points have been assigned, theaverage position of the training point for each cluster may be found andthe cluster center is moved to that point. The clusters may become thedesired centers of the hidden layer elements.

In some examples, for some transfer functions (e.g., the Gaussianfunction), the scaling factor σ may be determined, and may be determinedbefore determining the connection weights. The scaling factor may beselected to cover the training points to allow a smooth fit of thedesired network outputs. Generally, this refers to any point within theconvex hull of the processing element centers may significantly activatemore than one element. To achieve this goal, each hidden layer elementmay activate at least one other hidden layer element to a significantdegree. An appropriate method to determine the scaling parameter a maybe based on the P-nearest neighbor heuristic, which may be given as,

$\sigma_{i} = {\frac{1}{P}{\sum_{j = 1}^{P}{{{C_{j} - C_{i}}}^{2}\left( {{i = 1},2,\ldots,H} \right)}}}$

where C_(j) (for i=1,2, . . . , H) are the P-nearest neighbors of C_(i.)

The connection weights may additionally or instead be determined duringtraining. In an example of a neural network, such as neural network 100of FIG. 1 , having one hidden layer of weighted connections an outputelements which are summation units, the optimization function ofEquation 3 may become a linear least-squares problem once the centervectors and the scaling parameter have been determined. The linearleast-squares problem may be written as

$\begin{matrix}{{\min\limits_{W}{\sum_{n = 1}^{M}{{{Y(n)} -}}^{2}}} = {\min\limits_{W}{\sum_{n = 1}^{M}{{{WF} - \hat{Y}}}^{2}}}} & (5)\end{matrix}$

where W={Wij} is the L×H matrix of the connection weights, F is an H×Mmatrix of the outputs of the hidden layer processing elements and whosematrix elements are computed using

F _(in) =f _(i)(∥X(n)−C _(i)∥)(i=1,2, . . . ,H;n=1,2, . . . ,M)

and

=[

(1)

(2), . . . ,

(M)] is the L×M matrix of the desired (e.g., known) outputs. Theconnection weight matrix W may be found from Equation 5 and may bewritten as follows:

$\overset{︷}{W} = {{\overset{︷}{Y}F^{+}} = {\overset{︷}{Y}{\lim\limits_{\alpha\rightarrow 0}{F^{T}\left( {{FF}^{T} + {\alpha I}} \right)^{- 1}}}}}$

where F⁺ is the pseudo-inverse of F. In this manner, the above mayprovide a batch-processing method for determining the connection weightsof a neural network. It may be applied, for example, where all inputsample sets are available at one time. In some examples, each new sampleset may become available recursively, such as in therecursive-least-squares algorithms (RLS). In such cases, the connectionweights may be determined as follows.

First, connection weights may be initialized to any value (e.g., randomvalues may be used).

The output vector Y(n) may be computed using Equation 2. The error terme_(i)(n) of each output element in the output layer may be computed asfollows:

e _(i)(n)=y _(i)(n)−

_(i)(n)(i=1,2, . . . ,L)

The connection weights may then be adjusted based on the error term, forexample as follows:

W _(ij)(n+1)=W _(ij)(n)+γe _(i)(n)f _(j)(∥X(n)−C _(i)∥)

(i=1,2, . . . , L; j=1,2, . . . , M)

where γ is the learning-rate parameter which may be fixed ortime-varying.

The total error may be computed based on the output from the outputlayer and the desired (known) data:

ϵ=∥Y(n)−

∥²

The process may be iterated by again calculating a new output vector,error term, and again adjusting the connection weights. The process maycontinue until weights are identified which reduce the error to equal toor less than a threshold error.

Accordingly, the neural network 100 of FIG. 1 may be trained todetermine parameters (e.g., weights) for use by the neural network 100to perform a particular mapping between input and output data. Forexample, training the neural network 100 may provide one set ofparameters to use when decoding encoded data that had been encoded witha particular encoding technique (e.g., low density parity check coding(LDPC), Reed-Solomon coding, Bose-Chaudhuri-Hocquenghem (BCH), and/orPolar coding). The neural network 100 (and/or another neural network)may be trained multiple times, using different known input/output datapairs, for example. Multiple trainings may result in multiple sets ofconnection weights. For example, a different set of weights may bedetermined for each of multiple encoding techniques—e.g., one set ofweights may be determined for use with decoding LDPC encoded data andanother set of weights may be determined for use with decoding BCHencoded data.

Recall that the structure of neural network 100 of FIG. 1 is provided byway of example only. Other multilayer neural network structures may beused in other examples. Moreover, the training procedures describedherein are also provided by way of example. Other training techniques(e.g., learning algorithms) may be used, for example, to solve the localminimum problem and/or vanishing gradient problem. Determined weightsand/or vectors for each decoder may be obtained by an off-line learningmode of the neural network, which may advantageously provide moreresources and data.

In examples of supervised learning, the input training samples: [x₁ (n),x₂ (n), . . . , x_(N)(n)] may be generated by passing the encodedsamples [b₁(n), b₂(n), . . . , b_(N)(n)] through some noisy channelsand/or adding noise. The supervised output samples may be thecorresponding original code [a₁(n), a₂(n), . . . , a_(L)(n)] which maybe used to generate [b₁(n), b₂ (n), . . . , b_(N)(n)] by the encoder.Once these parameters are determined in offline mode, the desireddecoded code-word can be obtained from input data utilizing the neuralnetwork (e.g., computing equation 2), which may avoid complex iterationsand feedback decisions used in traditional error-correcting decodingalgorithms. In this manner, neural networks described herein may providea reduction in processing complexity and/or latency, because somecomplexity has been transferred to an off-line training process which isused to determine the weights and/or functions which will be used.Further, the same neural network (e.g., the neural network 100 of FIG. 1) can be used to decode an input code-word encoded from any of multipleerror correction encoder by selecting different weights that wereobtained by the training for the particular error correction techniqueemployed. In this manner, neural networks may serve as a universaldecoder for multiple encoder types.

FIG. 2 is a schematic diagram of hardware implementation of a neuralnetwork 200. The hardware implementation of the neural network 200, maybe used, for example, to implement one or more neural networks, such asthe neural network 100 of FIG. 1 . The hardware implementation of theneural network 200 includes a processing unit 230 having two stages—afirst stage which may include multiplication/accumulation unit 204,table look-up 216, multiplication/accumulation unit 206, table look-up218, and multiplication/accumulation unit 208, and table look-up 220.The processing unit 230 includes a second stage, coupled to the firststage in series, which includes multiplication/accumulation unit 210,table look-up 222, multiplication/accumulation unit 212, table look-up224, multiplication/accumulation unit 214, and table look-up 226. Thehardware implementation of the neural network 200 further includes amode configurable control 202 and a weight memory 228.

The processing unit 230 may receive input data (e. g. x₁ (n), x₂ (n), .. . , x_(N)(n)) from a memory device, communication transceiver and/orother component. In some examples, the input data may be encoded inaccordance with an encoding technique. The processing unit 230 mayfunction to process the encoded input data to provide output data—e.g.,y₁ (n), y₂ (n), . . . , y_(L), (n). The output data may be the decodeddata (e.g., an estimate of the decoded data) corresponding to theencoded input data in some examples. The output data may be the datacorresponding to the encoded input data, but having reduced and/ormodified noise.

While two stages are shown in FIG. 2 —a first stage including Nmultiplication/accumulation units and N table look-ups, and a secondstage including L multiplication/accumulation units and L tablelook-ups—any number of stages, and numbers of elements in each stage maybe used. In the example of FIG. 2 , one multiplication/accumulation unitfollowed by one table look-up may be used to implement a processingelement of FIG. 1 . For example, the multiplication/accumulation unit204 coupled in series to the table look-up 216 may be used to implementthe combiner 102 of FIG. 1 .

Generally, each multiplication/accumulation unit of FIG. 2 may beimplemented using one or more multipliers and one or more adders. Eachof the multiplication unit/accumulation units of FIG. 2 may includemultiple multipliers, multiple accumulation unit, or and/or multipleadders. Any one of the multiplication unit/accumulation units of FIG. 2may be implemented using an arithmetic logic unit (ALU). In someexamples, any one of the multiplication unit/accumulation units mayinclude one multiplier and one adder that each perform, respectively,multiple multiplications and multiple additions. The input-outputrelationship of an example multiplication/accumulation unit may bewritten as:

Z _(out)=Σ_(i=1) ^(I) W _(i) *Z _(in)(i)  (6)

where “I” is the number of multiplications to be performed by the unit,W_(i) refers to the coefficients to be used in the multiplications, andZ_(in)(i) is a factor for multiplication which may be, for example,input to the system and/or stored in one or more of the table look-ups.

The table-lookups shown in FIG. 2 may generally perform a predeterminednonlinear mapping from input to output. For example, the table-lookupsmay be used to evaluate at least one non-linear function. In someexamples, the contents and size of the various table look-ups depictedmay be different and may be predetermined. In some examples, one or moreof the table look-ups shown in FIG. 2 may be replaced by a singleconsolidated table look-up. Examples of nonlinear mappings (e.g.,functions) which may be performed by the table look-ups include Gaussianfunctions, piece-wise linear functions, sigmoid functions,thin-plate-spline functions, multiquadratic functions, cubicapproximations, and inverse multi-quadratic functions. Examples offunctions have been described with reference also to FIG. 1 . In someexamples, selected table look-ups may be by-passed and/or may bede-activated, which may allow a table look-up and its associatedmultiplication/accumulation unit to be considered a unity gain element.Generally, the table-lookups may be implemented using one or morelook-up tables (e.g., stored in one or more memory device(s)), which mayassociate an input with the output of a non-linear function utilizingthe input.

Accordingly, the hardware implementation of neural network 200 may beused to convert an input code word (e g. x₁ (n), x₂ (n), . . . ,x_(N)(n)) to an output code word (e.g., y₁(n), y₂ (n), . . . ,y_(L)(n)). Examples of the conversion have been described herein withreference to FIG. 1 . For example, the input code word may correspond tonoisy encoded input data. The hardware implementation of the neuralnetwork 200 may utilize multiplication with corresponding weights (e.g.,weights obtained during training) and look up tables to provide theoutput code word. The output code word may correspond to the decodeddata and/or to a version of the encoded input data having reduced and/orchanged noise.

The mode configuration control 202 may be implemented using circuitry(e.g., logic), one or more processor(s), microcontroller(s),controller(s), or other elements. The mode configuration control 202 mayselect certain weights and/or other parameters from weight memory 228and provide those weights and/or other parameters to one or more of themultiplication/accumulation units and/or table look-ups of FIG. 2 . Insome examples, weights and/or other parameters stored in weight memory228 may be associated with particular encoding techniques. Duringoperation, the mode configuration control 202 may be used to selectweights and/or other parameters in weight memory 228 associated with aparticular encoding technique (e.g., Reed-Solomon coding, BCH coding,LDPC coding, and/or Polar coding). The hardware implementation of neuralnetwork 200 may then utilize the selected weights and/or otherparameters to function as a decoder for data encoded with that encodingtechnique. The mode configuration control 202 may select differentweights and/or other parameters stored in weight memory 228 which areassociated with a different encoding technique to alter the operation ofthe hardware implementation of neural network 200 to serve as a decoderfor the different encoding technique. In this manner, the hardwareimplementation of neural network 200 may flexibly function as a decoderfor multiple encoding techniques.

FIG. 3 is a schematic illustration of apparatus 300 (e.g., an integratedcircuit, a memory device, a memory system, an electronic device orsystem, a smart phone, a tablet, a computer, a server, an appliance, avehicle, etc.) according to an embodiment of the disclosure. Theapparatus 300 may generally include a host 302 and a memory system 304.

The host 302 may be a host system such as a personal laptop computer, adesktop computer, a digital camera, a mobile telephone, or a memory cardreader, among various other types of hosts. The host 302 may include anumber of memory access devices (e.g., a number of processors). The host302 may also be a memory controller, such as where memory system 304 isa memory device (e.g., a memory device having an on-die controller).

The memory system 304 may be a solid state drive (SSD) or other type ofmemory and may include a host interface 306, a controller 308 (e.g., aprocessor and/or other control circuitry), and a number of memorydevice(s) 310. The memory system 304, the controller 308, and/or thememory device(s) 310 may also be separately considered an “apparatus.”The memory device(s) 310 may include a number of solid state memorydevices such as NAND flash devices, which may provide a storage volumefor the memory system 304. Other types of memory may also be used.

The controller 308 may be coupled to the host interface 306 and to thememory device(s) 310 via a plurality of channels to transfer databetween the memory system 304 and the host 302. The interface 306 may bein the form of a standardized interface. For example, when the memorysystem 304 is used for data storage in the apparatus 300, the interface306 may be a serial advanced technology attachment (SATA), peripheralcomponent interconnect express (PCIe), or a universal serial bus (USB),among other connectors and interfaces. In general, interface 306provides an interface for passing control, address, data, and othersignals between the memory system 304 and the host 302 having compatiblereceptors for the interface 306.

The controller 308 may communicate with the memory device(s) 314 (whichin some embodiments can include a number of memory arrays on a singledie) to control data read, write, and erase operations, among otheroperations. The controller 308 may include a discrete memory channelcontroller for each channel (not shown in FIG. 3 ) coupling thecontroller 308 to the memory device(s) 314. The controller 308 mayinclude a number of components in the form of hardware and/or firmware(e.g., one or more integrated circuits) and/or software for controllingaccess to the memory device(s) 314 and/or for facilitating data transferbetween the host 302 and memory device(s) 314.

The controller 308 may include an ECC encoder 310 for encoding data bitswritten to the memory device(s) 314 using one or more encodingtechniques. The ECC encoder 310 may include a single parity check (SPC)encoder, and/or an algebraic error correction circuit such as one of thegroup including a Bose-Chaudhuri-Hocquenghem (BCH) ECC encoder and/or aReed Solomon ECC encoder, among other types of error correctioncircuits. The controller 308 may further include an ECC decoder 312 fordecoding encoded data, which may include identifying erroneous cells,converting erroneous cells to erasures, and/or correcting the erasures.The memory device(s) 314 may, for example, include one or more outputbuffers which may read selected data from memory cells of the memorydevice(s) 314. The output buffers may provide output data, which may beprovided as encoded input data to the ECC decoder 312. The neuralnetwork 100 of FIG. 1 and/or the hardware implementation of neuralnetwork 200 of FIG. 2 may be used to implement the ECC decoder 312 ofFIG. 3 , for example. In various embodiments, the ECC decoder 312 may becapable of decoding data for each type of encoder in the ECC encoder310. For example, the weight memory 228 of FIG. 2 may store parametersassociated with multiple encoding techniques which may be used by theECC encoder 310, such that the hardware implementation of neural network200 may be used as a ‘universal decoder’ to decode input data encoded bythe ECC encoder 310, using any of multiple encoding techniques availableto the ECC encoder.

The ECC encoder 310 and the ECC decoder 312 may each be implementedusing discrete components such as an application specific integratedcircuit (ASIC) or other circuitry, or the components may reflectfunctionality provided by circuitry within the controller 308 that doesnot necessarily have a discrete physical form separate from otherportions of the controller 308. Although illustrated as componentswithin the controller 308 in FIG. 3 , each of the ECC encoder 310 andECC decoder 312 may be external to the controller 308 or have a numberof components located within the controller 308 and a number ofcomponents located external to the controller 308.

The memory device(s) 314 may include a number of arrays of memory cells(e.g., non-volatile memory cells). The arrays can be flash arrays with aNAND architecture, for example. However, embodiments are not limited toa particular type of memory array or array architecture. Floating-gatetype flash memory cells in a NAND architecture may be used, butembodiments are not so limited. The cells may be multi-level cells (MLC)such as triple level cells (TLC) which store three data bits per cell.The memory cells can be grouped, for instance, into a number of blocksincluding a number of physical pages. A number of blocks can be includedin a plane of memory cells and an array can include a number of planes.As one example, a memory device may be configured to store 8 KB(kilobytes) of user data per page, 128 pages of user data per block,2048 blocks per plane, and 16 planes per device.

According to a number of embodiments, controller 308 may controlencoding of a number of received data bits according to the ECC encoder310 that allows for later identification of erroneous bits and theconversion of those erroneous bits to erasures. The controller 308 mayalso control programming the encoded number of received data bits to agroup of memory cells in memory device(s) 314.

The apparatus shown in FIG. 3 may be implemented in any of a variety ofproducts employing processors and memory including for example cameras,phones, wireless devices, displays, chip sets, set top boxes, gamingsystems, vehicles, and appliances. Resulting devices employing thememory system may benefit from examples of neural networks describedherein to perform their ultimate user function.

From the foregoing it will be appreciated that, although specificembodiments have been described herein for purposes of illustration,various modifications may be made while remaining with the scope of theclaimed technology. Certain details are set forth herein to provide anunderstanding of described embodiments of technology. However, otherexamples may be practiced without various of these particular details.In some instances, well-known circuits, control signals, timingprotocols, neural network structures, algorithms, and/or softwareoperations have not been shown in detail in order to avoid unnecessarilyobscuring the described embodiments. Other embodiments may be utilized,and other changes may be made, without departing from the spirit orscope of the subject matter presented here.

FIG. 4 is a flowchart of a method arranged in accordance with examplesdescribed herein. The example method may include block 402, which may befollowed by block 404, which may be followed by block 406, which may befollowed by block 408, which may be followed by block 410. Additional,fewer, and/or different blocks may be used in other examples, and theorder of the blocks may be different in other examples.

Block 402 recites “receive known encoded and decoded data pairs, theencoded data encoded with a particular encoding technique.” The knownencoded and decoded data pairs may be received by a computing devicethat includes a neural network, such as the neural network 100 of FIG. 1and/or the neural network 200 of FIG. 2 and/or the ECC decoder 312 ofFIG. 3 . Signaling indicative of the set of data pairs may be providedto the computing device.

Block 404 may follow block 402. Block 404 recites “determine a set ofweights for a neural network to decode data encoded with the particularencoding technique.” For example, a neural network (e.g., any of theneural networks described herein) may be trained using the encoded anddecoded data pairs received in block 402. The weights may be numericalvalues, which, when used by the neural network, allow the neural networkto output decoded data corresponding encoded input data encoded with aparticular encoding technique. The weights may be stored, for example,in the weight memory 228 of FIG. 2 . In some examples, training may notbe performed, and an initial set of weights may simply be provided to aneural network, e.g., based on training of another neural network.

In some examples, multiple sets of data pairs may be received (e.g., inblock 402), with each set corresponding to data encoded with a differentencoding technique. Accordingly, multiple sets of weights may bedetermined (e.g., in block 404), each set corresponding to a differentencoding technique. For example, one set of weights may be determinedwhich may be used to decode data encoded in accordance with LDPC codingwhile another set of weights may be determined which may be used todecode data encoded with BCH coding.

Block 406 may follow block 404. Block 406 recites “receive data encodedwith the particular encoding technique.” For example, data (e.g.,signaling indicative of data) encoded with the particular encodingtechnique may be retrieved from a memory of a computing device and/orreceived using a wireless communications receiver. Any of a variety ofencoding techniques may have been used to encode the data.

Block 408 may follow block 406. Block 408 recites “decode the data usingthe set of weights.” By processing the encoded data received in block406 using the weights, which may have been determined in block 404, thedecoded data may be determined. For example, any neural networkdescribed herein may be used to decode the encoded data (e.g., theneural network 100 of FIG. 1 , neural network 200 of FIG. 2 , and/or theECC decoder 312 of FIG. 3 ). In some examples, a set of weights may beselected that is associated with the particular encoding technique usedto encode the data received in block 406. The set of weights may beselected from among multiple available sets of weights, each for use indecoding data encoded with a different encoding technique.

Block 410 may follow block 408. Block 410 recites “writing the decodeddata to or reading the decoded data from memory.” For example, datadecoded in block 408 may be written to a memory, such as the memory 314of FIG. 3 . In some examples, instead of or in addition to writing thedata to memory, the decoded data may be transmitted to another device(e.g., using wireless communication techniques). While block 410 recitesmemory, in some examples any storage device may be used.

In some examples, blocks 406-410 may be repeated for data encoded withdifferent encoding techniques. For example, data may be received inblock 406, encoded with one particular encoding technique (e.g., LDPCcoding). A set of weights may be selected that is for use with LDPCcoding and provided to a neural network for decoding in block 408. Thedecoded data may be obtained in block 410. Data may then be received inblock 406, encoded with a different encoding technique (e.g., BCHcoding). Another set of weights may be selected that is for use with BCHcoding and provided to a neural network for decoding in block 408. Thedecoded data may be obtained in block 410. In this manner, one neuralnetwork may be used to decode data that had been encoded with multipleencoding techniques.

Examples described herein may refer to various components as “coupled”or signals as being “provided to” or “received from” certain components.It is to be understood that in some examples the components are directlycoupled one to another, while in other examples the components arecoupled with intervening components disposed between them. Similarly,signal may be provided directly to and/or received directly from therecited components without intervening components, but also may beprovided to and/or received from the certain components throughintervening components.

What is claimed is:
 1. An apparatus comprising: a first stage ofcombiners configured to receive encoded input data and furtherconfigured to implement a first function to provide first intermediatedata; and a second stage of combiners configured to receive the firstintermediate data and further configured to combine the firstintermediate data using a set of predetermined weights to provide theencoded data with reduced noise.
 2. The apparatus of claim 1, furthercomprising: a third stage of combiners configured to receive the firstintermediate data and implement a second function to provide secondintermediate data to the second stage of combiners.
 3. The apparatus ofclaim 1, wherein the first function is a nonlinear function.
 4. Theapparatus of claim 1, wherein the first stage of combiners and secondstage of combiners comprises a first plurality ofmultiplication/accumulation units, the first plurality ofmultiplication/accumulation units each configured to multiply at leastone bit of the encoded input data with at least one of the set ofpredetermined weights and sum multiple weighted bits of the encodedinput data.
 5. The apparatus of claim 4, wherein the first stage ofcombiners further comprises a first plurality of table look-ups, thefirst plurality of table look-ups each configured to look-up at leastone intermediate data value corresponding to an output of a respectiveone of the first plurality of multiplication/accumulation units based onat least one non-linear function.
 6. The apparatus of claim 5, whereinthe at least one non-linear function comprises a Gaussian function, apiece-wise linear function, a sigmoid function, a thin-plate-splinefunction, a multiquadratic function, a cubic approximation, an inversemulti-quadratic function, or combinations thereof.
 7. The apparatus ofclaim 1, wherein the set of predetermined weights is based at least inpart on an encoding technique associated with the encoded input data. 8.The apparatus of claim 7, wherein the encoding technique comprisesReed-Solomon coding, Bose-Chaudhuri-Hocquenghem (BCH) coding,low-density parity check (LDPC) coding, Polar coding, or combinationsthereof.
 9. The apparatus of claim 1, wherein the set of predeterminedweights are based on training of a neural network using known noisyencoded data and encoded data pairs.
 10. The apparatus of claim 1,further comprising: an encoder configured to encode the input data usingencoded bits in accordance with an encoding technique and to provide theencoded input data; and a memory configured to receive the encoded inputdata from the encoder and configured to store the encoded input data,wherein, in storing the encoded input data, noise is introduced into theencoded input data.
 11. A method comprising: transmitting signaling,from a memory of a computing device, indicative of data encoded anencoding technique; and modifying, at a neural network of the computingdevice, the data encoded with an encoding technique using a set ofweights to provide encoded data with reduced noise.
 12. The method ofclaim 11, further comprising: receiving, at the computing device,signaling indicative of a set of encoded data pairs comprising encodeddata; and determining, for the neural network, the set of weights thatmodifies the encoded data using the signaling indicative of the set ofencoded data pairs.
 13. The method of claim 12, wherein determining theset of weights comprises selecting weights resulting in a minimizedvalue of an error function between an output of the neural network andknown noisy encoded data.
 14. The method of claim 11, wherein modifyingthe data using the neural network using the set of weights to provideencoded data with reduced noise comprises: combining the data encodedwith the encoding technique among the set of weights to provide theencoded data with reduced noise using a plurality of layers of theneural network, comprising an input layer, a hidden layer, an outputlayer, or combinations thereof.
 15. The method of claim 11, wherein theencoded data with reduced noise is an estimate of the encoded datarelative to output of an encoder associated with the encoding technique.16. The method of claim 11, wherein the encoding technique comprisesReed-Solomon coding, Bose-Chaudhuri-Hocquenghem (BCH) coding,low-density parity check (LDPC) coding, Polar coding, or combinationsthereof.
 17. The method of claim 16, wherein the neural network istrained multiple times, once for each encoding technique used.
 18. Amemory system comprising: one or more output buffers configured totransmit noisy output data; and a neural network configured to receivethe noisy output data, and configured to utilize initial weightsselected based on encoded data pairs to provide an estimate of encodeddata with reduced noise.
 19. The memory system of claim 18, wherein thenoise is introduced in transmitting the output data from the outputbuffers.
 20. The memory system of claim 18, wherein the neural networkis configured to use multiple stages of nodes to provide the estimate ofthe encoded data, the multiple stages of nodes comprising an inputstage, a hidden stage, an output stage, or combinations thereof.